278 research outputs found
Improving Middleware Performance with AdOC: an Adaptive Online Compression Library for Data Transfer
http://csdl2.computer.org/In this article, we present the AdOC (Adaptive Online Compression) library. It is a user-level set of functions that enables data transmission with compression. The compression is performed dynamically during the transmission and the compression level is constantly adapted according to the environment. In order to ease the integration of AdOC into existing software the API is very close to the read and write UNIX system calls and respects their semantic. Moreover this library is thread-safe and is ported to many UNIXlike systems. We have tested AdOC under various conditions and with various data types. Results show that the library outperforms the POSIX read/write system calls on a broad range of networks (up to 100 Mbit LAN), whereas on Gbit Ethernet, it provides similar performance
Symbolic Mapping and Allocation for the Cholesky Factorization on NUMA machines: Results and Optimizations
International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread and data placement in order to achieve performance gains up to 50% compared to state-of- the-art libraries such as PLASMA or MKL
Adaptive Online Data Compression
Quickly transmitting huge data in the context of distributed computing on wide area network can be achieved by compressing data before transmission. However, such an approach is not efficient when dealing with high-speed networks. Indeed, the time to compress a large file and to send it is greater than the time to send the uncompressed file. In this paper, we propose an algorithm that allows to overlap communications with compression and to adapt the compression ratio according to the network speed (the slower the network, the more we use efficient and slow compression algorithms). The advantage of such an adaptive algorithm is its generality and that its suitability for a large set of applications
New Dynamic Heuristics in the Client-Agent-Server Model
Colloque avec actes et comité de lecture. internationale.International audienceMCT is a widely used heuristic for scheduling tasks onto grid platforms. However, when dealing with many tasks, MCT tends to dramatically delay already mapped task completion time, while scheduling a new task. In this paper we propose heuristics based on two features: the historical trace manager that simulates the environment and the perturbation that defines the impact a new allocated task has on already mapped tasks. Our simulations and experiments on a real environment show that the proposed heuristics outperform MCT
DKPN: A Composite Dataflow/Kahn Process Networks Execution Model
International audienceTo address the high level of dynamism and variability in modern streaming applications (e.g. video decoding) as well as the difficulties in programming heterogeneous MPSoCs, we propose a novel execution model based upon both dataflow and Kahn process networks. This paper presents the semantics and properties of this hierarchical and parametric model, called DKPN. Parameters are classified and it is shown that hints can be derived to improve the execution. A scheduler framework and policies to back the model are also exposed. Experiments illustrate the benefits of our approach
On the complexity of task graph scheduling with transient and fail-stop failures
This paper deals with the complexity of task graph scheduling with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difïŹcult when task replication is used. Our main result is that this problem is #P'- Complete (hence at least as hard as NP-Complete problems), with both transient and fails-stop processor failures. We also study the complexity of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution
Affinité entre les processus, métriques et impact sur les performances : étude expérimentale
Process placement, also called topology mapping, is a well-known strategy to improve parallel program execution by reducing the communication cost between processes. It requires two inputs: the topology of the target machine and a measure of the affinity between processes. In the literature, the dominant affinity measure is the communication matrix that describes the amount of communication between processes. The goal of this paper is to study the accuracy of the communication matrix as a measure of affinity. We have done an extensive set of tests with two fat-tree machines and a 3d-torus machine to evaluate several hypotheses that are often made in the literature and to discuss their validity. First, we check the correlation between algorithmic metrics and the performance of the application. Then, we check whether a good generic process placement algorithm never degrades performance. And finally, we see whether the structure of the communication matrix can be used to predict gain.Le placement de processus en prenant en compte la topologie de la machine est unetechnique bien connue pour rĂ©duire le temps dâexĂ©cution dâun programme parallĂšle en diminuantle coĂ»t des communications entre les processus. Il nĂ©cessite deux entrĂ©es : la topologie de lamachine cible, et une mesure de lâaffinitĂ© entre les processus. Dans la littĂ©rature, la mesuredâaffinitĂ© qui prĂ©domine est la matrice de communication qui comptabilise les communicationsentre les processus. Le but de ce papier est dâĂ©tudier la pertinence de la matrice de communicationcomme mesure de lâaffinitĂ©. Dans ce but, nous avons rĂ©alisĂ© un grand nombre de tests sur unemachine de type fat-tree ainsi que sur un tore 3d, afin dâĂ©valuer plusieurs hypothĂšse qui seretrouvent souvent dans la littĂ©rature et de discuter de leur validitĂ©. Pour cela, dâabord nousvĂ©rifions la corrĂ©lation entre des mĂ©triques algorithmiques et la performance de lâapplication.Ensuite, nous contrĂŽlons quâun bon algorithme de placement nâimplique jamais une dĂ©gradationdes performances dâune application. Et finalement, nous Ă©tudions la structure de la matrice decommunication dans le but de voir si elle peut ĂȘtre utilisĂ©e dans la prĂ©diction du gain
Improving MPI Applications Performance on Multicore Clusters with Rank Reordering
International audienceModern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how the MPICH2 implementation of the MPI_Dist_graph_create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric
Scheduling on the Grid : Historical Trace and Dynamic Heuristics
We present a historical trace manager and new dynamic scheduling heuristics that can be used, and are studied, in the client-agent-server model on the `grid'. These heuristics rely on the common acknowledgment of the characteristics of the tasks submitted to the agent, but also on the construction of the underlying historical trace of the different tasks submitted to each server. We study each heuristic and compare them on several metrics to an instantiation of MCT (Minimum Completion time), chosen as reference heuristic. The simulation experiments we have conducted show that they are likely to give good results when tested in a real environment
- âŠ